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(N : Abstract. We show that for primes p < 10^^ the period length k{p^) of the 

^ , Fibonacci sequence modufo is never equal to its period length modulo p. 

The investigation involves an extensive search by computer. As an application, 
we establish the general formula = k{p) ■ p"~^ for all primes less than 10^^. 
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1. Introduction 



■ 1.1. The Fibonacci sequence {Fk}k>o is defined recursively by Fq = 0, Fi = 1, 

and Fk = F^-i + Fk-2 for k > 2. Modulo some integer / > 2, it must ultimately 
become periodic as there are only different pairs of residues modulo /. There is no 
pre-period since the recursion may be reversed to Fk-2 = Fk — Fk-i- The minimal 
I period «:(/) of the Fibonacci sequence modulo / is often called the Wall number as 

^ ' its main properties were discovered by D. D. Wall [Wa]. 

^ ! Wall's results may be summarized by the theorem below. It shows, in particular, 

^ I that k{1) is in general a lot smaller than P. In fact, one always has k{1) < 61 

Q I whereas equality holds if and only if Z = 2 ■ 5" for some n > 1. 

Theorem 1.2 (Wall), a) // gcd(/i,/2) = 1 then nikk) = lcm(K(/i), ^(/s)). 

O ! In particular, if I = YliLiPT 'where the pi are pairwise different prime numbers 

^ ; then k{1) = lcm(/€(p^i), . . . , 

It is therefore sufficient to understand k on prime powers. 

b) k{2) = 3 and k(5) = 20. Otherwise, 

• if p is a prime such that p = ±1 (mod 5) then K,{p)\{p — 1). 

• If p is a prime such that p = ±2 (mod 5) then K,{p)\{2p + 2) but K,{p)\{p + 1). 
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c) If I >^ then k{1) is even. 

d) Ifp is prime, e>l, and p^lF^^p) but p'^'^^\Fk(^p) then 



(1) K{p^) 



k{p) for n < e, 
k{p) ■p^~^ for n> e. 



2. The Open Problems 
2.1. The Period Length Modulo a Prime. 

2.1.1. It is quite surprising that the Fibonacci sequence still keeps secrets. But there 

arc at least two of them. 

Problem 2.1.2. The first open problem is "What is the exact value of k{p)T\ 
Equivalently, one should understand precisely the behaviour of the quotient Q 
given by Q{p) := for p = ±1 (mod 5) and Q{p) := ^^^y^ for P = ±2 (mod 5). 
One might hope for a formula expressing Q{p) in terms of p but, may be, that is 
too optimistic. 

2.1.3. It is known that Q is unbounded. This is an elementary result due to D. Jar- 
den [Ja, Theorem 3]. 

On the other hand, Q does not at all tend to infinity. If fact, in his unpublished 
Ph.D. thesis [Go], G. Gottsch computes a certain average value of ^. To be more 
precise, under the assumption of the Generalized Riemann Hypothesis, he proves 

^ = C ^ + Q /^ a^loglogx x 

p<x, p prime 

where Ci = ||| Hp prime(l - ^) ~ 0-331 055 98. The proof shows as well that the 
density of {p prime | Q{p) = l.p = ±1 (mod 5)} within the set of all primes is 
equal to C, = % Up prime(l - ^) ^ 0.265 705 45. 

Not assuming any hypothesis, it is still possible to verify that the right hand 
side constitutes an upper bound. For that, the error term needs to be weakened 

to o( ^i»s;°g;°g^ ). 

^ log X log log X ' 

For the case p = ±2 (mod 5), G. Gottsch's results are less strong. Under the 
assumption of the Generalized Riemann Hypothesis, he establishes the estimate 

^_ < c ^— + o ( ^^Qg^Qg^Qg^ ^ 

ps±2 (^od 5) ^ ~ ^ ^°g ^ V log X • log log X ) 

p=3 (mod 4) 
p<x,p prime 

where C3 = i Hp prime,p^2,5(l " ^) ~ 0-210 055 99. The density of the set 
{p prime | Q{p) = l.p = ±2 (mod 5),p = 3 (mod 4)} within the set of all primes 
is at most C4 = inpprime,p^2,5(l " ^) ~ 0.196 818 85. 
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2.1 A. It seems, however, that the inequahties could well be equalities. In addition, 
the restriction to primes satisfying p = 3 (mod 4) might be irrelevant. 

In fact, we performed a count for small primes p < 2 • 10^ by computer. Up to 
that bound, there are 317687 prime numbers such that p = ±2 (mod 5) and 
p = 3 (mod 4). At them, we find Q{p) = 1 exactly 250 246 times which is a relative 
frequency of 0.787 712 434 . . . = 4 ■ 0.196 928 108 ... . 

On the other hand, there are 317 747 primes p satisfying p = ±2 (mod 5) and 
p = 1 (mod 4). Among them, Q{p) = 1 occurs 250 353 times which is basically the 
same frequency as in the case p = 3 (mod 4). 

2.2. The Period Length Modulo a Prime Power. 

Problem 2.2.1. There is another open problem. In fact, there is one question 
which was left open in the formulation of Theorem 1.2: What is the exact value 
of e in dependence of p? Experiments for small p show that e = 1. Is this always 
the case? In other words, does one always have 

(2) k(p") = k{p) ■ p"-^ 

similarly to the famous formula for Euler's (/? function? 

This is the most perplexing point in D. D. Wall's whole study of the Fibonacci 
sequence modulo m. For p < 10'^, it was investigated by help of an electronic 
computer by Wall in 1960, already. 

2.2.2. We continued Wall's investigation concerning Problem 2.2.1 on today's ma- 
chines. Our main result is Theorem 4.4 below. The purpose of the present article is 
to give a description of our approach, particularly of the various algorithms devel- 
oped and optimizations used. 

Definition 2.2.3. We call a prime number p exceptional if equation (2) is wrong 
for some n>2. 

Fundamental Lemma 2.2.4. Assume p ^ 2 and I to be a multiple of k{p). 
Then, p^\Fi is sufficient for I being a period of {Fj. mod p^}k>Q, i-^- for k{p^)\1. 

Proof. The claim is that, in our situation, F^+i = 1 (mod p^) is automatic. 

For that, we note that there is the standard formula — 

which we explain in (5) below. Here, by assumption, F/ = (mod p^) and, by virtue 

of the recursion, = Fi^i (mod p^). Therefore, F^^-,^ = 1 (mod p^). 

On the other hand, the condition k{p)\1 implies that F^+i = 1 (mod p). 

As p 7^ 2, Hensel's lemma says that the lift is unique. This shows F;_|_i = 1 (mod p^) 
which is our claim. □ 

Proposition 2.2.5. Let p be a prime number. Then, the following assertions 
are equivalent. 
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i) p is exceptional, 

ii) is divisible by p^ . 

Proof, "i) =^ ii)" Assume, to the contrary, that p^\Ff^(j,y By definition of k{p), 
we know for sure that nevertheless p|Fk(p). Together, these statements mean. The- 
orem 1.2.d) may be apphed for e — 1 showing k{p'^) — k{p) ■ p^~^ for every n e N. 
This contradicts i). 

"ii) =^ i)" We choose the maximal e G N such that p^|-Fk(p). By assumption, e > 2. 
Then, Theorem 1.2.d) implies k{p^) = k{p) which shows equation (2) to be wrong 
for n = 2. p is exceptional. □ 

Proposition 2.2.6. Let p 2,5 be a prime number. 

I. If p = ±1 (mod 5) then the following assertions are equivalent. 

i) p is exceptional, 

ii) Fp_i is divisible by p^, 

iii) For r = e Z/p'^Z one has r^-^ = 1. 

II. If p = ±2 (mod 5) then the following assertions are equivalent. 

i) p is exceptional, 

ii) F2P+2 is divisible by p^ , 

iii) Fp+i is divisible byp"^. 

iv) In Rp := Z/p^Z [r]/(r^ — r — 1) one has r^^^ = —1. 

Proof. 1. "i) =^ iii)" Wc put s = and use formula (3) below. By Proposi- 

tion 2.2.5, is divisible by p^. Therefore, r^^*') = s'^(p) = (r^^^))"^ G (Z/p^Z)*, 
i.e. (r'^^P))^ = 1. Since K,{p)\{p — 1), we may conclude (r^"^)^ — \ from this. 
On the other hand, we know r^~^ = 1 (mod p) by Fermat's Theorem. Unique- 
ness of Hensel's hft implies r^~^ = 1. 

"in) =^ ii)" We have r^-^sP"^ = (-1)^"^ = 1. Thus, r^'^ = 1 implies s^'^ = 1. 
Consequently, mod p^) = — = and Fp_i is divisible by p^. 

"ii) i)" As {p — 1) is a multiple of K,{p), Lemma 2.2.4 may be applied. It shows 
k{p'^)\{p — 1). This contradicts equation (2) for n = 2. p is exceptional. 

II. "i) =^ ii)" By Proposition 2.2.5, -Fk(p) is divisible by p"^. In that situation. 
Lemma 2.2.4 implies that k{p) is actually a period of {F^ mod p'^}k>o- By conse- 
quence, {2p -|- 2) is a period of {Fk mod p^}k>o, too. This shows p'^\F2p+2- 

"ii) =^ in)" Since -^2^+2 = -P'p+i^+i, all we need is p\Vp^i. This, however, is clear 
as Vp+i = r*'+i + F'+i = -2 (mod p). 

"iii) =^ iv)" The assumption imphes r*'+^ = f^'+^ i.e. rP+^ e Z/p^Z. As rf = -1, 
we may conclude {r^^^Y — r^'^^f^'^^ — (— l)^"^-*^ = 1 from this. Hensel's lemma 
implies r^"*"^ = — 1 since r^^^ = — 1 (mod p) is known. 
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"iv) =^ i)" = -1 makes sure that (^2^+2 mod p^) = = 0. 

As (2p+2) is a multiple of Lemma 2.2.4 may be applied. It shows K(p^)|(2p+2). 
This contradicts equation (2) for n = 2. p is exceptional. □ 

Remark 2.2.7. By Proposition 2.2.6, the problem of finding exceptional primes is 
in perfect analogy to the problem of finding Wieferich primes. 

In the Wieferich case, one knows 2*'"^ = 1 (mod p) and would like to understand 
the set of all primes for which even 2^~^ = 1 (mod p^) is valid. Here, we know 
= (mod p) and look for the primes which fulfill ^^(p) = (mod p^). 

At least in the case p = ±1 (mod 5), there is, in fact, more than just an anal- 
ogy. We consider a particular case of the generalized Wieferich problem where 2 is 
replaced by r. 

Remark 2.2.8. One might want to put the concept of an exceptional prime into 
the wider context of algebraic number theory. We work in the number field Q(\/5) 
in which r = ^'^^ is a fundamental unit. 

By analogy, we could say that an odd prime number p is exceptional for the real qua- 
dratic number field K — Q(\/rf) if, for e a fundamental unit in K, s^"^ = 1 (mod p^) 
when (^) = 1, £2p+2 = 1 (mod ^^len (^) = -1, or e^^-'^^ = 1 (mod p^) in the 
ramified case d\p. A congruence modulo p^ is, of course, supposed to mean equality 
in ^k/{p^) which, as p 7^ 2, is isomorphic to Z[v^]/(p2) = Z/p^Z [X]/{X'^ - d). 
Note that there is no ambiguity coming from the choice of e since all exponents 
are even. 

For many real quadratic number fields it does not require sophisticated program- 
ming to find a few exceptional primes. Below, we give the complete list of all ex- 
ceptional primes p < 10^ for the fields Q(\/rf) where d is square- free and up to 101. 
Thereby, primes put in parentheses are those such that Q(yd) is ramified at p. 
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exceptional primes p 
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exceptional primes p 


d 


exceptional primes p 


2 


13, 31, 1546 463 


17 




34 


37, 547, 4 733 


3 


103 


19 


79, 1271731, 13 599 893, 31352 389 


35 


23, 577, 1325 663 


5 




21 


46 179 311 


37 


7, 89, 257, 631 


6 


(3), 7, 523 


22 


43, 73, 409, 28 477 


38 


5 


7 




23 


7, 733 


39 


5, 7, 37, 163 409, 795 490 667 


10 


191, 643, 134339, 2B 233 137 


26 


2 683, 3 967, 18 587 


41 


29, 53, 7 211 


11 




29 


3, 11 


42 


(3), 5, 43, 71 


13 


241 


30 




43 


3, 479 


14 


6 707 879, 93 140 353 


31 


157, 261687 119 


46 


(23) 


15 


(3), 181, 1 039, 2917, 2401457 


33 


(3), 29, 37, 6713797 


47 


5 762 437 


d 


exceptional primes p 


d 


exceptional primes p 


d 


exceptional primes p 


51 


(3), 5, 37, 4831 


67 


3, 11, 953, 57301 


83 


3, 19 699, 2 417 377 


53 


5 


69 


(3) , 5, 17, 52 469 057 


85 


3, 204520 559 


55 


571 


70 


(5), 59, 20411 


86 


1231, 5 779 


57 


59, 28 927, 1726 079, 7 480159 


71 


67, 2 953, 8 863, 522 647 821 


87 


(3), 17, 757, 1 123 


58 


3, 23, 4 639, 172 721, 16 557419 


73 


5, 7, 41, 3947, 6079 


89 


5, 7, 13, 59 


59 


1559, 17 385 737 


74 


3, 7, 1 171 


91 


(13), 1218691 


61 




77 


3, 418 270 987 


93 


(3), 13 


62 


3, 5, 263, 388897 


78 


(3), 19, 62591 


94 


73 


65 


1327, 8 831, 569 831 


79 


3, 113, 4049, 6199 


95 


6 257, 10 937 


66 


21023, 106 107 779 


82 


3, 5, 11, 769, 3256531, 624451181 


97 


17, 3 331 










101 


7, 19 301 
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Among these 158 exceptional primes, there are exactly nine for which even the 
stronger congruence modulo is true. These are p = 3 for d = 29, 42, 67, and 74, 
p = 5 for d = 62, 73, and 89, p = 17 for d = 69, and p = 29 for d = 41. We do not 
observe a congruence of the type above modulo p^. 



3. Background 

3.1. Part a) of Wall's theorem is trivial. 
For the proof of b), Binet's formula 

® - -ir' 

where r = ^-^j^ and s = is of fundamental importance. It is easily established 

by induction. If p = ±1 (mod 5) then 5 is a quadratic residue modulo p and, 
therefore, G Fp. Fermat states their order is a divisor of p — 1. 

Otherwise, G Fp2 are elements of norm (—1). As the norm map N: F*2 — >■ F* 
is surjective, its kernel is a group of order 2—-!- = p+l and ^N~^{{ 1, —1 }) = 2p+2. 

As F*2 is cyclic, we see that N''^{{ 1, —1 }) is even a cyclic group of order 2p + 2. 
N{r) — N{s) — —1 implies that both r and s are not contained in its subgroup of 
index two. Therefore, 

(4) rP+' = = -1 (mod p). 

From this, we find Fp+2 = ^^"""^^^^^ = = = — 1 (mod p) which shows p + 1 
is not a period of {Fk}k>o modulo p. 

c) In the case p = ±2 (mod 5) this follows from b). It is, however, true in general. 
Indeed, for every /c G N, one has 



jt^k I j^^H~l gk 1 j^k 1 Qk~\~\ y-»2fc | 2?^^ 5^ 

Fk+iFk-i - -Ffe = z 



(5) _ -(-l)^-Hr^ + ^^) + 2(-l)^ 



= (-1)' 

as rs = — 1 and r"^ -\- — 3. On the other hand. 



- = 1 . 1 - 0^ = 1 (mod 0- 

As / > 3 this implies k{1) is even. 

For d), it is best to establish the following p-uplication formula first. 
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Lemma 3.2 (Wall). One has 

j odd 

Here, {Vk}k>o is the Lucas sequence given by Vq — 2, Vi — 1, and — T4_i + T4_2 
for k>2. 

Proof. Induction shows Vk — r'' + s'^. Having that in mind, it is easy to calculate 
as follows. 

The assertion follows from the Binomial Theorem. □ 

3.3. The fundamental Lemma 2.2.4 allows us to prove d) for p 7^ 2 in a somewhat 
simpler manner than D. D. Wall did it in [Wa]. 

First, we note that for n < e, 2.2 A implies k{p'^)\k{p). However, divisibility the 
other way round is obvious. 

For n > e, by Lemma 2.2.4, it is sufficient to prove ^'^(-^^(p).^"-^) = n, i.e. that 
p"|F„(p).pn-e but p"''"^t-^K;(p).pn-e. ludccd, the flrst divisibility imphes K,{p^)\n{p) -p""^ 
while the second, apphed for n — 1 instead of n, yields k>{p'^)\k>{p) -p""^"^. The result 
follows as k(p)|k(p"). 

For i^p{FK(^pypn-e) = n, we proceed by induction, the case n — e being known by 
assumption. One has 

i=3 
J odd 

In the second term, every summand is divisible by -^^(p).pn-e, i-e. by p^"'. The claim 
would follow if we knew p\V^(j,ypn-e. This, however, is easy as there is the formula 

(7) Vi = Fi.i + F^+i 
which implies = 2 (mod p) for / any multiple of k{p). 

3.4. For p = 2, as always, things are a bit more complicated. We still have 
/t(2"') = 3 ■ 2"^^. However, for n > 2, one has 2"'+-'^|F3.2n-i for which there is no 
analogue in the p 7^ 2 case. On the other hand, t'2(-f3-2"-i+i — I) = n which is 
sufficient for our assertion. 

The duplication formula provided by Lemma 3.2 is 

(8) F2k = FkVk = Fk{Fk_i + Ffe+i) = F^ + 2FkFk-i. 

As Fe — 8, a, repeated application of this formula shows 2"+^|F3.2n-i for every n > 2. 
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We further claim F2k+i — + -F|+i. Indeed, this is true for A; = as 1 = 0^ + 1^ 
and we proceed by induction as follows: 

F2k+3 = F2k+i + F2k+2 = F^ + F^^i + + 2Fk+iFk = 

= ^k+i + {Fk + Fk+if = -F^+i + F^^2- 

The assertion t'2(-^3-2"-i+i — 1) = n is now easily established by induction. We note 
that Fy = 13 = 1 (mod 4) but the same is no longer true modulo 8. Furthermore, 
-^3-2"+i = -^3.2"-! -^3-2"-i+i where the first summand is even divisible by 2^""*"^. 
The second one is congruent to 1 modulo 2""*"^, but not modulo 2""*"^, by consequence 
of the induction hypothesis. 

4. A HEURISTIC ARGUMENT 

4.1. We expect that there are infinitely many exceptional primes for Q{V5). 

Our reasoning for this is as follows. is known by definition of K,{p). Thus, 

for any individual prime p, {F^(j)-^ mod p^) is one residue out of p possibilities. If we 
were allowed to assume equidistribution then we could conclude that p^|Fk(p) should 
occur with a "probability" of ^. Further, by [RS, Theorem 5], 

loglogA^ + A \ — < V - < loglogA^ + yl + \ — , 

21og2A^~ ^ p~ 21og^Ar' 

^ p prime ° 

p<7V 

at least for > 286. Here, ^4 e R is Mertens' constant which is given by 



^ = 7 + 



E 

p prime 



i + log (l - 1) 



= 0.261 497 212 847 642 783 755 .. . 



lP ^ 

whereas 7 denotes the Euler-Mascheroni constant. 

This means that one should expect around log log N + A exceptional primes less 
than A^. 

4.2. On the other hand, p^\Fk(p) should occur only a few times or even not at all. 
Indeed, if we assume equidistribution again, then for any individual prime p, 
should happen with a "probability" of ^. However, 
00 ^ 

V — = 0.452 247420 041065 498 506... . 

p=2 ^ 
p prime 

is a convergent series. 

Remark 4.3. It is, may be, of interest that, for any exponent n > 2, one has the 
equality ^^^.-^^.^^^ = Yl'k=i C(^^) where the right hand converges a lot faster 

and may be used for evaluation. This equation results from the Moebius inversion 
formula and Euler's formula logC(nfc) = ^ - log(l - ^) = Y17=i 1 S 

p prime p prime 
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4.4. We carried out an extensive search for exceptional primes but, unfortunately, 
we had no success and our result is negative. 

Theorem. There are no exceptional primes p < 10^^. 

Down the earth, this means that one has k{p"') = K,{p) ■ for every n e N and 
all primes p < 10^^. 

5. Algorithms 

5.0. 1. We worked with two principally different types of algorithms. First, in the 
p = ±1 (mod 5) case, it is possible to compute (r^~^ mod p^). A second and more 
complete approach is to compute (Fp-i mod p"^) in the p = ±1 (mod 5) case and 
{F2P+2 mod p^) or (Fp+i mod p^) in the case p = ±2 (mod 5). 

Remark 5.0.2. In the case p = ±2 (mod 5), p 7^ 2, exceptionality is equiva- 
lent to r^P^^ = 1 (mod p^). Unfortunately, an approach based on that observation 
turns out to be impractical as it involves the calculation of a modular power in 
Rp — Z/p'^Z[\/E] = Z[\/5]/(p^) in a situation where \/5 ^ Z/p'^Z. In comparison 
with Z/p^Z, multiplication in Rp is a lot slower, at least in our (naive) implemen- 
tations. This puts a modular powering operation in Rp out of competition with a 
direct approach to compute ^2^+2 (or Fp+i) modulo p^. 

5.1. Algorithms based on the computation of a/5- 
5.1.1. If p = ±1 (mod 5) then one may routinely compute (r^~^ modp^). The 
algorithm should consist of four steps. 

i) Compute the square root of 5 in Z/pZ. 

ii) Take the Hensel's hft of this root to Z/p^Z. 

iii) Calculate the golden ratio r := e TLjp^TL. 

iv) Use a modular powering operation to find (r^~^ mod p^). 

We call algorithms which follow this strategy algorithms powering the golden ratio. 

Here, the final steps iii) and iv) are not critical at all. For iii), it is obvious that 
this is a simple calculation while for iv), carefully optimized modular powering 
operations are available. Further, ii) can be effectively done as = 5 (mod p) 
implies w := r — ■ (^ mod p) ■ p is a square root of 5 modulo p^. Thus, the 
most expensive operation should be a run of Euclid's extended algorithm in order 
to find (^ mod p). 

In fact, there is a way to avoid even this. We first calculate | G Fp. This is 
easier than an arbitrary division in residues modulo p. We may put | := 
if p = I (mod 5) and | := ^ if p = —I (mod 5). Then, a representative v 
of (^ mod p) can be computed as v — r ■ ^. We get away with one integer division 
and one multiplication. 
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5.1.2. Thus, the most interesting point is i), the computation of -\/5 e Fp. In gen- 
eral, there is a beautiful algorithm to find square roots modulo a prime number due 
to Shanks [Co, Algorithm 1.5.1]. Wc implemented this algorithm but let it finally 
run only in the p = 1 (mod 8) case. U p ^ 1 (mod 8) then there are direct formulae 
to compute the square root of 5 which turn out to work faster. 

li p = 3 (mod 4) then one may simply put w := (5^ mod p) to find a square root 
of 5 by one modular powering operation. 

If p = 5 (mod 8) then one may put 

(10) w :— (5~8~ mod p) 
as long as 5~ = 1 (mod p) and 

(11) w := (10 -20^ modp) 

if 5"^ = -1 (modp). Note that 5 is a quadratic residue modulo p. Hence, we 

always have 5^ = ±1 (mod p). 

p — 1 

For sure, (5^ mod p) can be computed using a modular powering operation. 
In fact, we implemented an algorithm doing that and let it run through the in- 
terval [10^2^5 • 10^2] _ 

p—1 

However, (5~ mod p) is nothing but a quartic residue symbol. For that reason, 
there is an actually faster algorithm which we obtained by an approach using the 
law of biquadratic reciprocity. 

Proposition 5.1.3. Let p be a prime number such that p = 5 (mod 8) and 
p = ±1 (mod 5) and let p — + b^ be its (essentially unique) decomposition into 
a sum of two squares. 

a) Then, a and b may be normalized such that a = 3 (mod 4) and b is even. 

b) Assume a and b are normalized as described in a). Then, there are only the 
following eight possibilities. 

i) a = 3, 7, 11, or 19 (mod 20) and b = 10 (mod 20). 

In this case, 5^ = 1 (mod p), i.e. 5 is a quartic residue modulo p. 

ii) a = 15 (mod 20) and 6 = 2, 6, 14, or 18 (mod 20). 

p—1 

Here, 5~ = —1 (mod p), i.e. 5 is a quadratic but not a quartic residue modulo p. 

Proof, a) As p is odd, among the integers a and b there must be an even and an 
odd one. We choose b to be even and force a = 3 (mod 4) by replacing a by (—a), 
if necessary. 

b) We first observe that = 1 (mod 8) forces 6^ = 4 (mod 8) and 6 = 2 (mod 4). 
Then, we realize that one of the two numbers a and b must be divisible by 5. Indeed, 
otherwise we had a^, 6^ = ±1 (mod 5) which does not allow -|- 6^ = ±1 (mod 5). 
Clearly, a and b cannot be both divisible by 5. 
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If a is divisible by 5 then a = 3 (mod 4) implies a = 15 (mod 20). 6 = 2 (mod 4) 
and h not divisible by 5 yield the four possibilities stated. On the other hand, if h is 
divisible by 5 then 6 = 2 (mod 4) implies 6 = 10 (mod 20). a = 3 (mod 4) and 
a not divisible by 5 show there are precisely the four possibilities listed. 

p — 1 

For the remaining assertions, we first note that (5^ mod p) tests whether 
X* = b (mod p) has a solution a; e i.e. whether 5 is a quartic residue modulo p. 
By [IR, Lemma 9.10.1], we know 



where X denotes the quartic residue symbol. The law of biquadratic reciprocity 
[IR, Theorem 9.2] asserts 

Xa+6i(5) = Xb{a + hi). 

For that, we note explicitly that a + bi = 3 + 2i (mod 4), 5 = 1 (mod 4), and 
= 6 is even. Let us now compute X5(o + 6i): 

Xsia + bi) = x_i+2i(a + bi) ■ X-i-2i{a + bi) 



Here, the first equation is the definition of the quartic residue symbol for composite 
elements while the second is [IR, Proposition 9.8.3.c)]. 

For the third equation, we observe that X-i-2j(a — bi) is either ±1 or ±i. By sim- 
ply omitting the complex conjugation, we would make a sign error if and only 
if x_i_2i(a — bi) = ±i. By [IR, Lemma 9.10.1], this means exactly that a — bi 
defines, under the identification 2i = —1, not even a quadratic residue modulo 5. 
Therefore, the correction factor is (-^)- The final equation follows from [IR, Propo- 
sition 9.8.3.b)]. 

We note that, by virtue of [IR, Lemma 9.10.1], X-i-2i{p) tests whether p is a quartic 
residue modulo 5 or not. As p is for sure a quadratic residue, we may write 



or, if we want, x~i-2i{p) — {p mod 5). 

The eight possibihties could now be inspected one after the other. A more conceptual 
argument works as follows. In case i), we have 



p—i 

(5~ modp) = Xa+bi{5) 



X-i-2i{a - bi) ■ x-i-2i{a + bi) 





= ^- j = (a^ mod 5) = (a^ + b^ mod 5) = (p mod 5) 
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p — 1 

Therefore, (5~ mod p) — 1. On the other hand, in case ii). 




= — (p mod 5). 

p— 1 

Hence, (5^ mod p) = —1. □ 

5.1.4. Although we are not going to make use of it, let us state the complementary 
result for p = 1 (mod 8) . 

Proposition. Let p be a prime such that p = 1 (mod 8) and p = ±1 (mod 5) and 
let p — + b'^ be its ( essentially unique ) decomposition into a sum of two squares. 

a) Then, a and b may be normalized such that a = 1 (mod 4) and b is even. 

b) Assume a and b are normalized as described in a). Then, there are only the 
following eight possibilities. 

i) a = 1, 9, 13, or 17 (mod 20) and 6 = (mod 20). 

In this case, 5^ = 1 (mod p), i.e. 5 is a quartic residue modulo p. 

ii) a = 5 (mod 20) and b = A,8, 12, or 16 (mod 20). 

Here, 5^ = — 1 (mod p), i.e. 5 is a quadratic but not a quartic residue modulo p. 

Proof, a) As p is odd, among the integers a and b there must be an even and an 
odd one. We choose b to be even and force a = 1 (mod 4) by replacing a by (—a), 
if necessary. 

b) We first observe that = 1 (mod 8) forces 6^ = (mod 8) and 4|6. Then, we 
realize that one of the two numbers a and b must be divisible by 5. Indeed, otherwise 
we had a^, 6^ = ±1 (mod 5) which does not allow + 6^ = ±1 (mod 5). Clearly, a 
and b cannot be both divisible by 5. 

If a is divisible by 5 then a = 1 (mod 4) implies a = 5 (mod 20). 4|6 and b not 
divisible by 5 yield the four possibilities stated. On the other hand, if b is divisible 
by 5 then 4|6 implies 6 = (mod 20). a = 1 (mod 4) and a not divisible by 5 show 
there are precisely the four possibihties listed. 

The proof of the remaining assertions works exactly in the same way as the proof 
of Proposition 5.1.3 above. We note exphcitly that a + bi = 1 (mod 4) makes sure 
that the law of biquadratic reciprocity may be applied. □ 

5.1.5. As the transformation a h-> —a does not affect any of the three statements 
below, we may formulate the following theorem. Actually, this is the result we need 
for the application. 
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Theorem. Letp he a prime number such thatp = 1 (mod 4) andp = ±1 (mod 5) 
and let p = + b'^ be its decomposition into a sum of two squares. We normalize 
a and b such that a is odd and b is even. Then, the following three statements 
are equivalent. 

i) 5 is a quartic residue modulo p. 

ii) b is divisible by 5. 

iii) a is not divisible by 5. 

Remark 5.1.6. We note that the restrictions on p exclude only trivial cases. 
If p ^ ±1 (mod 5) then 5 is not even a quadratic residue modulo p. If p = 3 (mod 4) 
then every quadratic residue is automatically a quartic residue. 

Algorithm 5.1.7. The square sum sieve algorithm for prime numbers p such that 
= 21, 29 (mod 40) runs as follows. 

We investigate a rectangle [A^i, iV2] x [Mi, M2] of numbers. Wc will go through the 
rectangle row-by-row in the same way as the electron beam goes through a screen. 

a) We add 0, 1, 2, or 3 to Mi to make sure Mi = 2 (mod 4). Then, we let b go from 
Ml to M2 in steps of length four. 

b) For a fixed b we sieve the odd numbers in the interval [A^i, A^2]- 

Except for the odd case that l\a,b which we decided to ignore as the density of these 
pairs is not too high, Z|a^ -|- b^ implies that (—1) is a quadratic residue modulo Z, 
i.e. we need to sieve only by the primes I = 1 (mod 4). 

For each such / which is below a certain limit we cross out all those a such that 
a = ±vib (mod /). Here, Vi is a square root of (—1) modulo I, i.e. vf = —1 (mod /). 
For practical application, this requires that the square roots of (—1) modulo the 
relevant primes have to be pre-computed and stored in an array once and for all. 

c) For the remaining pairs (a, b), we compute p ~ a^ + b"^ and do steps i) through iv) 
from 5.1.1. In step i), if 6 is divisible by 5 then we use formula (10) to compute the 
square root of 5 modulo p. Otherwise, we use formula (11). 

5.1.8. In practice, we ran the square sum sieve algorithm on the rectangles 
[0, 4 000 000] X [1 580 000, 4 000 000] and [1 580 000, 4 000 000] x [0, 1 580 000] , thereby 
capturing every prime p £ [5 • 10^^, 1.6 • 10^^] such that p = 21,29 (mod 40) plus 

several others. 

In fact, on the second rectangle we ran a modified version, the inverted square sum 
sieve, where the two outer loops are reversed. That means, we let a go through the 
odd numbers in [A^'i, A^2] in the very outer loop. This has some advantage in speed 
as longer intervals are sieved at once. In other words, we go through the rectangle 
column-by-column. 

We implemented the square sum sieve algorithms in C using the mpz functions of 
GNU's GMP package for arithmetic on long integers. On a single 1211 MHz Athlon 
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processor, the computations for the first rectangle took around 22 days of CPU time. 
The computations for the smaller second rectangle were finished after nine days. 

5.1.9. For primes p such that p = 3 (mod 4) and p = ±1 (mod 5), the formula 
w := (5^ mod p) for the square root of 5 makes things a lot easier. Instead of 
the square sum sieve we implemented the sieve of Eratosthenes. Caused by the 
limitations of main memory in today's PCs, we could actually sieve intervals of only 
about 250 000 000 numbers at once. For each such interval the remainders of its 
starting point have to be computed (painfully) by explicit divisions. 

Algorithm 5.1.10. More precisely, the algorithm powering the golden ratio for 
primes p = 11, 19 (mod 20) runs as follows. 

We investigate an interval [Ni, N2]. We assume that A''2 — A^i is divisible by 5 • 10^ 
and that Ni is divisible by 20. 

a) We let an integer variable i count from to ^l^^^ — 1. 

b) For fixed i we work on the interval / = [Ni + 5 • 10^ ■ i,Ni + 5 ■ 10^ ■ {i + 1)]. 
For each prime I which is below a certain limit, we compute {Ni + 5 ■ 10^ ■ i mod I). 
Then, we cross out all p e /, p = 11 (or 19) mod 20 which are divisible by /. 

c) For the remaining p e 7, p = 11 (or 19) mod 20 we do steps i) through iv) 
from 5.1.1. In step i), we use the formula w :— (5~ mod p) to compute the square 
root of 5 modulo p. 

5.1.11. In practice, wc ran this algorithm in order to test all prime numbers 
p e [10^^,4 ■ 10^^] such that p = 11 (mod 20) or p = 19 (mod 20). It was im- 
plemented in C using the mpz functions of the CMP package. 

Later, when testing primes above 10^^, we used the low level mpn functions for 
long natural numbers. In particular, we implemented a modular powering function 
which is hand-tailored for numbers of the considered size. It uses the left-right 
base 2^ powering algorithm [Co, Algorithm 1.2.3] and the sliding window improve- 
ment from mpz_powm. 

Having done all these optimizations, work on the test interval [4- 10^'^, 4- 10^'^ + 5- 10^] 
of 250 000 000 numbers p such that p = 11 (mod 20), among them 19 955 067 primes, 
lasted 7:50 Minutes CPU time on a 1211MHz Athlon processor. 

5.1.12. Similarly, for prime numbers p satisfying the simultaneous congruences 
p = 1 (mod 8) and p = ±1 (mod 5), we implemented Shanks' algorithm [Co, Algo- 
rithm 1.5.1] to compute the square root of 5 modulo p. 
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Algorithm 5.1.13. More precisely, the algorithm powering the golden ratio for 
primes p = 1, 9 (mod 40) runs as follows. 

We investigate an interval [A^i, A^2]- We assume that N2 — Ni is divisible by 10^° 
and that A^i is divisible by 40. 



b) For fixed i we work on the interval I =[Ni + 10^° ■i,Ni + 10^° ■ {i + 1)]. For each 
prime I which is below a certain limit, we compute ((A^i + 10^° • i) mod I). Then, 
we cross out all p e /, p = 1 (or 9) (mod 40) which are divisible by I. 

c) For the remaining p G /, p = 1 (or 9) (mod 40) we do steps i) through iv) 
from 5.1.1. In step i), we use Shanks' algorithm to compute the square root of 5 
modulo p. 

5.1.14. We ran this algorithm on the interval [10^^,4 ■ 10^^]. After all optimiza- 
tions, the test interval [4 • 10^^4 • 10^^ + 10^°] of 250 000 000 numbers p such that 
p = 1 (mod 40), among them 19 954152 primes, could be searched through on a 
1211MHz Athlon processor in 10:30 Minutes CPU time. 

This is quite a lot more in comparison with the algorithm for p = 11 (mod 20) 
or p = 19 (mod 20). The difference comes entirely from the more complicated 
procedure to compute y/5 e Fp. 

Remark 5.1.15. At a certain moment, such a running time was no longer found rea- 
sonable. A direct computation of the Fibonacci numbers could be done as well. After 
several optimizations of the code of the direct methods, it turned out that only the 
3 mod 4 case could still compete with them. We discuss the direct methods in the 
subsection below. 

5.2. Algorithms for a direct computation of Fibonacci numbers. 

Algorithm 5.2.1. A nice algorithm for the fast computation of a Fibonacci number 
is presented in O. Forster's book [Fo]. It is based on the formulae 



and works in the spirit of the left-right binary powering algorithm using bits. 

Our adaption uses modular operations modulo p^ instead of integer operations. 
An implementation in O. Forster's Pascal-style multi precision interpreter language 
ARIBAS looks hke this. 



a) We let an integer variable i count from to 



N2-N1 



- 1. 



(12) 



F2k-i - Fk+ -^fc-i' 
F2k = Fk 2FfcFfc_i. 
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(* 

(* 

** Schnelle Berechnung der Fibonacci-Zahlen mittels der Formeln 

** fib(2*k-l) = fib(k)**2 + fib(k-l)**2 

** fib(2*k) = fib(k)**2 + 2*f ib(k)*f ib(k-l) 

** 

** Dabei werden alle Berechnungen mod m durchgefiilirt 
*) 

function fib(k,iii : integer): integer; 
var 

b, X, y, XX, temp: integer; 
begin 

if k <= 1 then refurn k end; 
X := 1; y := 0; 

for b := bit_length(k)-2 to by -1 do 
XX := x*x mod m; 
X := (xx + 2*x*y) mod m; 
y := (xx + y*y) mod m; 
if bit_test(k,b) then 

temp := x; 

X := (x + y) mod m; 

y := temp; 

end; 

end; 

return x; 

end. 

(** ein systematischer Versuch**) 

function testO : integer 

var 

p,r,rl : integer; 
ptest : boolean; 
begin 

for p := 90000000001 to 95000000001 by 2 do 
if Cp mod 10000) = 1 then 

writeln("getestete Zahl: p) ; 

end; 

ptest := rab_primetest (p) ; 
if Cptest = true) then 

if CCp mod 5=2) or Cp mod 5=3)) then 
r := f ib(2*p+2,p*p) ; 

else 

r := f ib(p-l ,p*p) ; 

end; 

if (r <= 30000000000000000) then 
rl := r div p; 

writelnCp," ist eine interessante Primzahl. Quotient rl) ; 

end; 

end; 

end; 

return (0) ; 

end. 

A call to f ib(k,m) computes {F}. mod m). test is the main function. testO 
executes an outer loop which contains a Rabin-Miller composedness test. For a 
pseudo prime p, it uses the function fib to compute {Fp^i mod p^) or (^2^+2 mod p^). 
As these are divisible by p we output the quotient instead. Note that in order to limit 
the output size we actually write an output only when the quotient is rather small. 

5.2.2. ARIBAS is fast enough to ensure that this algorithm could be run from p — 7 
up to 10^-^. We worked on ten PCs in parallel for five days. That was our first bigger 
computing project concerning this problem. It showed that no exceptional primes 
p < 10^^ do exist, thereby a establishing a lightweight version of Theorem 4.4. 
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5.2.3. The running time made it clear that we had approached to the hmits of an 
interpreter language. For a systematic test of larger prime numbers, the algorithm 
was ported to C. For the arithmetic on long integers we used the mpz functions 
of GMP. After only one further optimization, the integration of a version of the 
sieve of Eratosthenes, the interval [10^^ 10^^] could be attacked. A test interval 
of 250 000 000 numbers was dealt with on a 1211MHz Athlon processor in around 
40 Minutes CPU time. Again, we did parallel computing on ten PCs. The search 
through [10^^, 10^^] was finished in less than five days. 

5.2.4. For the interval [10^^, 10^^], the methods which compute -\/5 e Fp and square 
the golden ratio were introduced as they were faster than our implementation of 
O. Forstcr's algorithm at that time. For this reason, only the case p = ±2 (mod 5) 
was done by Forster's algorithm. It took us around 20 days on ten PCs. 

6. Optimizations 

6.1. Sieving. 

6.1.1. Near 10^^, one of about 32 numbers is prime. We work in a fixed prime 
residue class modulo 10, 20, or 40 but still, only one of about 13 numbers is prime. 
We feel that the computations of {Fp±i mod p^) should take the main part of the 
running time of our programs. Our goal is, therefore, to rapidly cxchidc (most of) 
the non-primes from the list and then to spend most of the time on the remain- 
ing numbers. 

There are various methods to generate the list of all primes within an interval. Un- 
fortunately, this section of our code is not as harmless as one could hope for. In fact, 
for an individual number p, one might have the idea to decide whether it is prob- 
ably prime by computing {Fp±i mod p). That is the Fibonacci composedness test. 
It would, unfortunately, not reduce our computational load a lot as it is almost 
as complex as the main computation. This clearly indicates the problem that the 
standard "pseudo primality tests" which are designed to test individual numbers 
are not well suited for our purposes. In this subsection, we will explain what we did 
instead in order to speed up this part of the program. 

6.1.2. Our first programs in ARIBAS in fact used the internal primality test to 
check each number in the interval individually. At the ARIBAS level, this is optimal 
because it involves only one instruction for the interpreter. 

When we migrated our programs to C, using the GMP library, we first tried the same. 
We used the function mpz_probab_prime with one repetition for every number to 
be tested. It turned out that this program was enormously inefficient. It took 
about 50 per cent of the running time for primality testing and 50 per cent for the 
computation of Fibonacci numbers. However, it could easily be tuned by a naive 
implementation of the sieve of Eratosthenes in intervals of length 1 000 000. 
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We first combined sieving by small primes and the mpz_probab_prime function 
because sieving by huge primes is slow. This made sure that the computa- 
tion of Fibonacci numbers took the major part of the running time. However, 
mpz_probab_prime is not at all intended to be combined with a sieve. In fact, it 
checks divisibility by small primes once more. Thus, an optimization of the code for 
the Fibonacci numbers reversed the relation again. It became necessary to carry out 
a further optimization of the generation of the list of primes. We decided to aban- 
don all pseudo primality tests. Further, we enlarged the length of the array of up to 
250 000 000 numbers to minimize the number of initializations. 

In principle, the sieve works as follows. Recall that we used different algorithms 
for the computation of the Fibonacci numbers, depending on the residue class of p 
modulo 10, 20, or 40. This leads to a sieve in which the number 

S{i) :— starting point -|- residue -|- modulus • i 

is represented by array position i. Since all our moduli are divisible by 2 and 5 we 
do no longer sieve by these two numbers. 

Such a sieve is still easy to use. Given a prime p ^ 2, 5, one has to compute the 
array index io of the first number which is divisible by p. Then, one can cross out 
the numbers at the indices io, io +P, io + 2p, ... until the end of the sieve is reached. 

6.1.3. Optimization for the Cache Memory. An array of the size above fits 
into the memory of today's PCs but it does not fit into the cache. Thus, the speed- 
limiting part is the transfer between CPU and memory. Sieving by big primes is like 
a random access to single bytes. The memory manager has to transfer one block to 
the cache memory, change one byte, and then transfer the whole block back to the 
memory. This is the limiting bottleneck. 

To avoid this problem as far as possible, we built a two stage sieve. 

In the first stage, we sieve by the first 25 000, the "small" , primes. For that, we divide 
the sieve further into segments of length 30 000. These two constants were found to 
be optimal in practical tests. They are heavily machine dependent. 

The first stage is now easily explained. In a first step, we sieve the first segment by all 
small primes. Then, we sieve the second segment by all small primes. We continue 
in that way until the end of the sieve is reached. 

In the second stage, we work with all relevant "big" primes on the complete sieve, 
as usual. 

The result of this strategy is a sieve whose segments fit into the machine's cache. 
Thus, the speed of the first sieve stage is the speed of the cache, not the speed of 
the memory. The speed of the second stage is limited by the initialization. 

On our machines the two stage sieve is twice as fast as the ordinary sieve. 
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6.1.4. The choice of the prime hmit for sieving is a point of interest, too. As we 
search for one very particular example, it would do no harm if, from to time, we 
test a composite number p for p^\Fp±i. When the computer would tell us divides 
Fp±i which, in fact, it never did then it would be easy to do a reliable primality test. 

As long as we sieve by small primes, it is clear that lots of numbers will be crossed 
out in a short time and this will reduce the running time as it reduces the number 
of times the actual computation of (-f],±i mod p^) is called. Afterwards, when we 
sieve by larger primes, the situation is no longer that clear. We will often cross 
out a number repeatedly which was crossed out already before. This means, it can 
happen that further sieving costs actually more time than it saves. 

Our tests show nevertheless that it is best to sieve almost till to the square root 
of the numbers to be tested. We introduced an automatic choice of the variable 
primeJimit as 13^^ which means we sieve by the first [j^^^] primes. Here, p means 
the first prime of the interval we want to go through. 

6.1.5. Another optimization was done by looking at the prime three. Every third 
number is crossed out when sieving by this prime and, when sieving by a big- 
ger prime, every third step hits a number which is divisible by three and already 
crossed out. 

Thus, we can work more efficiently as follows. Let p be a prime bigger than three and 
coprime to the modulus. We compute io, the first index of a number divisible by p. 
Then, we calculate the remainder of the corresponding number modulo three. If it 
is zero then we skip and continue with := io+P- Now, corresponds to the first 
number in the sieve which is divisible by p but not by three. Thus, we must cross out 
io, io + p,io + 3p, io + 4:p, io + 6p, ... or io, io + 2p, io + 3p, io + 5p, io + Gp, ■■■ 
depending on whether io + 2p corresponds to a number which is divisible by three 
or not. 



6.2. The Montgomery Representation. 

6.2.1. The algorithms for the computation of Fibonacci numbers modulo m ex- 
plained so far spend the lion's share of their running time on the divisions by m which 
occur as the final steps of modular operations such as x : = (xx + 2*x*y) mod m. 
Unfortunately, on today's PC processors, divisions are by far slower than multipli- 
cations or even additions. 

An ingenious method to avoid most of the divisions in a modular powering operation 
is due to P. L. Montgomery [Mo] . We use an adaption of Montgomery's method to 
O. Forster's algorithm which works as follows. 

Let R be the smallest positive integer which does not fit into one machine word. 
That will normally be a power of two. On our machines, R — 2^'^. Recall that all op- 
erations on unsigned integers in C are automatically modular operations modulo R. 
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We choose some exponent n such that the modulus m — fulfills m < In our 
situation p < 10^^, therefore m — p^ < 10^^ < such that n — ?> will be sufficient. 

Instead of the variables ... e Z/mZ, we work with their Montgomery rep- 

resentations XM,yM, ■■■ € These numbers are not entirely unique but bound 
to be integers from the interval [0, -^) fulfilling xm = (mod m). This means 
that modular divisions still have to be done in some initialization step, one for each 
variable that is initially there, but these turn out to be the only divisions we are 
going to execute! 

A modular operation, for example x :— {{x^ + 2xy) mod m), is translated into its 
Montgomery counterpart. In the example this is 

fxl^ + 2xMyM , \ 

Xm '■— mod m . 

V i?" / 

We see here that xu^yu < ^ implies x%[ + 2xMyM < 3 • An inspection of 

0. Forster's algorithm shows that we always have to compute mod m) for 
some A < 5 • ^ = 

A I— >■ ( — mod m] 

is Montgomery's REDC function. It occurs everywhere in the algorithm where 
normally a reduction modulo m, i.e. A {A mod m), would be done. 

This looks as if we had not won anything. But, in fact, we won a lot as for computer 
hardware it is much easier to compute mod m), which is a "reduction from 
below" , than {A mod m) which is a "reduction from above" and usually involves 
trial divisions. 

Indeed, A fits into 2n machine words. It has 2n so-called limbs. The rightmost, 

1. e. the least significant, n of those have to be transformed into zero by adding some 
suitable multiple of m. Then, these n limbs may simply be omitted. 

Which multiple of m is the suitable one that erases the rightmost limb Aq of A7 
Well, q ■ m for q := {—Aq ■ mod R) will do. This operation is in fact an 
ordinary multiplication of unsigned integers in C as (— ^o) on unsigned integers 
means (R—Ao) and multiplication is automatically modulo R. We add q-m to A and 
remove the last limb. This procedure of transforming the rightmost machine word 
of A into zero and removing it has to be repeated n times. 

Still, m needs to be inverted modulo R = 2^^. The naive approach for this would be 
to use Euclid's extended algorithm which, unfortunately, involves quite a number 
of divisions. At least, wc observe that it is necessary to do this only once, not 
n times although there arc n iterations. However, for the purpose of inverting an 
odd number modulo 2^^, there exists a very elegant and highly efficient C macro 
in GMP, named modlimbJnvert. It uses a table of the modular inverses of all odd 
integers modulo 2^ and then executes two Hensel's lifts in a row. Note that, if 
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i ■ n = 1 (mod N) then {2i — ■ n) ■ n = 1 (mod N"^). We observe that, in this 
particular case, we need no division for the Hensel's hft. 

What is the size of the representative of mod m) found? We have A < 
We add to that less than i?"m and divide by R^. Thus, the representative is less than 

^ + i?"m 

— = h m. 

i?" 5 

We want REDC(A) < -y-, the same inequality we have for all variables in Mont- 
gomery representation. To reach that, we may now simply subtract m in the case 
we found an outcome > (This is the point where we use m < 

Our version of REDC looks as follows. In order to optimize for speed, we designed 
it as a C macro, not as a function. 

#define REDCCmp, n, Nprim, tp) \ 
do { \ 

mp_limb_t cy; \ 

mp_limb_t qu; \ 

mp_size_t j ; \ 

\ 

for (j = 0; j < n; { \ 

qu = tp[0] * Nprim; \ 

/* q = tp[0]*invm mod 2"32. Reduktion mod 2"32 von selber! */ \ 

cy = mpn_addmul_l (tp, mp, n, qu) ; \ 

mpn_incr_u (tp + n, cy) ; \ 

tp++; \ 
} \ 

\ 

if (tp[n - 1] >= 0x33333333) /* 2-32 / 5. */ \ 

mpn_sub_n (tp, tp, mp, n) ; \ 
} while (0); 

It is typically invoked as REDC (m, REDC_BREITE, invm, ?);, with various vari- 
ables in the place of the ?, after invm is set by modlimb_invert (invm, in[0]) ; 
and invm = -invm;. Up to now, we always had REDC_BREITE = 3. 

At the very end of our algorithm wc find {Fk)M, the desired Fibonacci number in its 
Montgomery representation. To convert back, we just need one more call to REDC. 
Indeed, 



{Fk mod m) = ( mod ml = f mod m ) = REDC((Ffe)M). 



5 



+ ^"■171 1 



Further, {Fj^)m < ^ implies 

REDC((F,)m) < ^„ ^ , 

i.e. REDC((Ffe)M) < m. 

We note explicitly that there is quite a dangerous trap at this point. The residue 0, 
the one we are in fact looking for, will not be reported as but as m. We work 
around this by outputting residues of small absolute value. If (r mod m) is found 
and r is not below a certain output limit then m — r is computed and compared 
with that limit. 
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Remark 6.2.2. The integration of the Montgomery representation into our algo- 
rithm allowed us to avoid practically all the divisions. This caused a stunning 
reduction of the running time to about one third of its original value. 

6.3. Other Optimizations. 

6.3.1. We introduced several other optimizations. One, which is worth a mention, 
is the integration of a pre-computation for the first seven binary digits of p. Note, if 
we let p go linearly through a large interval then its first seven digits will change very 
slowly. This means, as a study of our algorithm for the computation of {Fp mod p^) 
shows, that the same first seven steps will be done again and again. We avoid this 
and do these steps once, as a pre-computation. As lO-*^^ consists of 47 binary digits 
this saves about 14 per cent of the running time. 

Of course, p is not a constant for the outer loop of our program and its first seven 
binary digits are only almost constant. One needs to watch out for the moment 
when the seventh digit of p changes. 

6.3.2. Another improvement by a few per cent was obtained through the switch 
to a different algorithm for the computation of the Fibonacci numbers. Our hand- 
tailored approach computes the A;-th Fibonacci number Fk simultaneously with the 
k-th Lucas number V^. It is based on the formulae 

F2k — FkVk, 

V2k = V,' + 2{-l)'+\ 
J, _ FkVk + Vi . , .u+i 

This is faster than the algorithm explained above as it involves only one multipli- 
cation and one squaring operation instead of one multiplication and two squaring 
operations. It seems here that the number of multiplications and the number of 
squaring operations determine the running time. Multiplications by two are not 
counted as multiplications as they are simple bit shifts. Bit shifts and additions are 
a lot faster than multiplications while a squaring operation costs about two thirds 
of what a multiplication costs. 

From that point of view there should exist an even better algorithm. One can make 
use of the formulae 

F2fe+l = 4F,2-F^, + 2(-l)^ 
(14) F2k-^^Fl-rFl_^, 

F^k — F^k+x — F2k-\ 

which we found in the GMP source code. If we meet a bit which is set then we 
continue with ^2^+1 and F2k- Otherwise, with F2k and F2k-\- 
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Here, there are only two squaring operations involved and no multiplications, at all. 
This should be very hard to beat. Our tests, however, unearthed that the program 
made from (14) ran approximately ten per cent slower than the program made 
from (13). For that reason, we worked finally with (13). Nevertheless, we expect 
that for larger numbers p, in a situation where additions and bit shifts contribute 
even less proportion to the running time, an algorithm using (14) should actually 
run faster. It is possible that this is the case from the moment on that > 2^^ does 
no longer fit into three limbs but occupies four. 

6.3.3. Some other optimizations are of a more practical nature. For example, in- 
stead of GMP's mpz functions wc used the low level mpn functions for long natural 
numbers. Further, we employed some internal GMP low level functions although 
this is not recommended by the GMP documentation. 

The point is that the size of the numbers appearing in our calculations is a-priori 
known to us and basically always the same. When, for example, we multiply two 

numbers, then it docs not make sense always to check whether the base case multi- 
plication, the Karatsuba scheme, or the FFT algorithm will be fastest. In our case, 
mpn_muLbasecase is always the fastest of the three, therefore we call it directly. 

6.4. The Performance Finally Achieved. 

6.4.1. As a consequence of all the optimizations described, the CPU time it took our 
program to test the interval [4 • 10^^ 4 • 10^^ + 2.5 • 10^] of 250 000 000 numbers p such 
that p = 3 (mod 10), among them 19 955 355 primes, was reduced to 8:08 Minutes. 
Sieving is done in the first 24 seconds. 

The tests were made on a 1211MHz Athlon processor. For comparison, on 
a 1673 MHz Athlon processor we test the same interval in around 6:30 Minutes 
and on a 3 GHz Pentium 4 processor in around 5:30 Minutes. (This relatively poor 
running time might partially be due to the fact that we carried out our trial runs 
on Athlon processors.) 

6.4.2. The Main Computational Undertaking. In a project of somewhat larger 
scale, we ran the optimized algorithm on all primes p in the interval [10^^, 10^^] such 
that p = ±2 (mod 5). Further, as the methods which start with the computation 
of \/5 e F„ are no longer faster, we ran it, too, on all prime numbers p G [4-10^^, 10^^] 
such that p = ±1 (mod 5) and on all primes p e [1.6 • 10^^,4 • 10^^] such that 
p = 5 (mod 8) and p = ±1 (mod 5). 

Altogether, this means that we fully tested the whole interval [10^^,10^^]. To do 
this took us around 820 days of CPU time. The computational work was done in 
parallel on up to 14 PCs from July till October 2004. 
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7. Output Data 

7.1. Computer Proof. Neither our earher computations for p < 10^^ nor the 
more recent ones for the interval [10^^, 10^^] detected any exceptional primes. As 
we covered the intervals systematically and tested each individual prime, this estab- 
lishes the fact that for all prime numbers p < 10^^ one has ^^{^^(p). There are no 
exceptional primes below that limit. Theorem 4.4 is verified. 



7.2. Statistical Observations. We do never find {Fp±i mod p"^) = 0. Does that 
mean, we have found some evidence that our assumption, the residues {Fp±i mod p^) 
should be equidistributed in { 0,p, 2p, ... ,p^ —p}, is wrong? Actually, it does not. 
Besides the fact that the value zero does not occur, all other reasonable statistical 
quantities seem to be well within the expected range. 

Indeed, a typical piece of our output data looks as follows. 



85760594147971. 
85760627258851. 
' 85760847493241. 
' 85761104075891. 



Durchsuche Fenster mit Nummer 34304. 
Beginne sieben. 
Restklassen berechnet. 
Beginne sieben mit kleinen Primzahlen. 
Sieben mit kleinen Primzahlen fertig. 
Fertig mit sieben. 
Initialisiere 

X mit 110560307156090817237632754212345, 
y mit 247220362414275519277277821571239 
und vorz mit 1 . 

10786 Quotient 1912354 mit p := 

10787 Quotient 1072750 mit p := 

10788 Quotient -1617348 mit p : 

10789 Quotient -3142103 mit p : 
Initialisiere 

X mit 178890334785183168257455287891792, 
y mit 400010949097364802732720796316482 
und vorz mit -1. 

10790 Quotient -9341211 mit p := 85761921174961 
Fertig mit Fenster mit Nummer 34304. 

Durchsuche Fenster mit Nummer 34305. 

Beginne sieben. 

Restklassen berechnet. 

Beginne sieben mit kleinen Primzahlen. 

Sieben mit kleinen Primzahlen fertig. 

Fertig mit sieben. 

Initialisiere 

X mit 178890334785183168257455287891792, 
y mit 400010949097364802732720796316482 
und vorz mit -1 . 

10791 Quotient 3971074 mit p : 

10792 Quotient 2441663 mit p : 
Fertig mit Fenster mit Nuimner 34305. 



85763512710481. 
85764391244491. 



To make the output easier to understand we do not print (-Fp±i mod p"^) which is 
automatically divisible by p but R{p) := (ip±i mod p^)/p G Z/pZ. Such a quotient 
may be as large as p. We output only those which fall into (—10^, 10^) which is very 
small in comparison to p. 

The data above were generated by a process which had started at 8-10^^ and worked 
on the primes p = l (mod 10). Till 8.5765 • 10^^, it found 10 792 primes p such that 
R{p) = mod e (-10^ 10^). 
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On the other hand, assuming equidistribution we would have predicted to find such 
a particularly small quotient for around 

E ^-^7-^ ^ (2 . 10^ - 1) • • (log(log(8.5765 • 10^^)) _ log(log(8 • 10^^))) 

p=8-10i^ 

psl (mod 10) 2-10'''— 1 10 10 

P prime = ^ (log(log(8.5765 • 10^^)) - log(log(8 • 10^3))) 

fa 10 856.330 

primes which is astonishingly close to the reality. 

Among the 10 792 small quotients found within this interval, the absolutely smallest 
one is i?(82 789 107950 701) = -42. We find 1074 quotients of absolute value less 
than 1 000 000, 98 quotients of absolute value less than 100 000, and 10 of absolute 
value less than 10 000. These are, besides the one above, 

i?(80 114 543 961 461) = -2437, 

i?(80 607 583 847 341) = -6949, 
i?(80 870 523 194 401) = -5751, 
i?(81232 564 906 631) = 3579, 
i?(81 916 669 933 751) = -2397, 
i?(83 575 544 636 251) = -1884, 
i?(84688857018011) = -1183, 
i?(84 771692 838 421) = 2281, 
i?(85 325 902 236 661) = -4473. 

There have been 5 235 positive and 5 557 negative quotients detected. 

Remarks 7.3. a) We note explicitly that this is not at all a constructed example. 
One may basically consider every interval which is not too small and will observe 
the same phenomena. 

b) Being very sceptical one might raise the objection that the computations done 
in our program do not really prove that the 10 792 numbers p which appear in the 
data are indeed prime. 

It is, however, very unlikely that one of them is composite as they all passed two tests. 
First, they passed the sieve which in this case makes sure they have no prime divi- 
sor < 8 302 871. This means, if one is composite then it decomposes into the product 
of two almost equally large primes. Furthermore, they were all found probably prime 
by the Fibonacci composedness test 

It is easy to check primality for all of them by a separate program. 
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7.4. Statistical Observations. A more spectacular interval is [0, 10^^]. One may 
expect a lot more small quotients as all small prime numbers are taken into consid- 
eration. 

Here, we may do some statistical analysis on the small positive values of the quo- 
tient R! which is given by B!{p) := (-Fp_i mod p^)/p for p = ±1 (mod 5) and by 
R'{p) := (-^2p+2 mod p'^)/p for p = ±2 (mod 5). 

Our computations show that there exist 96 909 quotients less than 100 000, 
12162 quotients less than 10 000, 1580 quotients less than 1000, 216 quotients 
less than 100, and 30 quotients less than 10. The latter ones are 
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163 ist eine interessante Primzahl. Quotient 6 
199 ist eine interessante Primzahl. Quotient 5 
239 ist eine interessante Primzahl. Quotient 5 
701 ist eine interessante Primzahl. Quotient 5 
941 ist eine interessante Primzahl. Quotient 6 
997 ist eine interessante Primzahl. Quotient 3 
1063 ist eine interessante Primzahl. Quotient 2 
1621 ist eine interessante Primzahl. Quotient 2 
2003 ist eine interessante Primzahl. Quotient 1 
27191 ist eine interessante Primzahl. Quotient 8 
86813 ist eine interessante Primzahl. Quotient 6 
123863 ist eine interessante Primzahl. Quotient 2 
199457 ist eine interessante Primzahl. Quotient 7 
508771 ist eine interessante Primzahl. Quotient 2 
956569 ist eine interessante Primzahl. Quotient 4 
1395263 ist eine interessante Primzahl. Quotient 3 
1677209 ist eine interessante Primzahl. Quotient 1 
3194629 ist eine interessante Primzahl. Quotient 5 
11634179 ist eine interessante Primzahl. Quotient 2 
467335159 ist eine interessante Primzahl. Quotient 4 
1041968177 ist eine interessante Primzahl. Quotient 6 
6_71661_90593 ist eine interessante Primzahl. Quotient 1 

Except for and 9, all one-digit numbers do appear. 

Further, the counts are again well within the expected range. For example, con- 
sider one-digit numbers. R{3) and R{7) are automatically one-digit. Therefore, the 
expected count is 

10 

2 + ^— ^2 + 10- (log(log 10^2) - log(log 10)) ^ 26.849 066 

p=10 P 
p prime 

which is surprisingly close the 30 one-digit quotients which were actually found. 

We note that already for two-digit quotients, it is no longer true that they ap- 
pear only within the subinterval [0,10^^]. In fact, there are twelve prime num- 
bers p e [10^^ 10^^] such that R^p) < 100. These are the following. 
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101876918491 liefert 87 

115301883659 liefert 60 
129316722167 liefert 44 
147486235177 liefert 59 
170273590301 liefert 78 
233642484991 liefert 89 
261836442223 liefert 45 
277764184829 liefert 64 
283750593739 liefert 37 
305128713503 liefert 93 
334015396151 liefert 79 
442650398821 liefert 74 

Once again, we may compare this to the expected count which is here 

10i2 

100 

^ ?a 100 • (log(log 10^2) - log(loglO^^)) ^ 8.701 13773. 

p=10" ^ 
p prime 
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